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Abstract 

Consider  the  model: 

Vi  =  /(<«)  +  z; 

where  /  is  a  decreasing  function  and  {z,}  are  assumed  to  be  a  station¬ 
ary  Gaussian  process  with  mean  zero  and  variance  a2 .  We  propose  a 
simple  thresholding  procedure  based  on  the  fact  that  the  wavelet  co¬ 
efficients  for  f,  under  Haar  basis,  are  non-negative.  We  show  that  our 
estimator  is  competitive  with  the  Grenander  estimator  both  theoreti¬ 
cally  and  numerically  (in  the  sense  of  mean-square-error). 

Key  Words  and  Phrases:  Isotonic  Regression;  Monotone  Curve; 
Grenander  Estimator;  Orthogonal  Wavelet  Transformation;  Shrinkage 
Estimator. 
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1  Introduction 


Consider  the  non-parametric  regression  model: 


Xi  =  f(ti)  +  Zi  i  =  1, n  (1) 

where  /  is  only  known  to  be  a  decreasing  function,  U  =  ijn  and  {z:}  are 
assumed  to  be  a  stationary  Gaussian  process  with  mean  zero  and  variance  a2. 
This  kind  of  regression  problem  is  referred  as  isotonic  regression.  Examples 
and  interpretations  of  such  models  can  be  found  in  Barlow  ct  al  (1972). 

Let  fi  =  f(ti)  and  /  =  (/i,...,/*)'  be  the  sampled  version  of  /.  For 
simplicity  we  assume  that  n  =  2m  is  dyadic.  Our  goal  is  to  find  an  estimate 
/  depending  only  on  Xi,  ...,Xn  with  small  mean-squared-error: 

RCU)  =  n-1YiE{fi-fi)2 
:=1 

In  this  paper,  we  will  discuss  the  following  3-step  wavelet  shrinkage  pro¬ 
cedure,  introduced  by  Donoho  and  Johnstone  (1992a,  b),  for  estimation  of 
/: 

[1]  Take  the  Haar  (the  simplest  wavelet)  transform  of  X^s  to  get  the 
empirical  wavelet  coefficients,  {yj,k} 

Let  the  wavelet  coefficients  of  /,  from  Lemma 

1  below,  ajtk  >  0  for  all  j,  k. 


[2]  Apply  the  threshold: 

&j,k  — 


(2) 


0  if  Vjyk  <  A, 

Vjjk  if  Vj,k  ^ 

to  the  empirical  wavelet  coefficients  { },  with  some  optimally  chosen 


threshlod  A j,  usually  A j  =  Oy/2  log(n)/n  for  all  levels. 
[3]  Invert  the  Haar  tranform  to  get  an  estimate  /,•  of  /,*. 


We  will  discuss  some  theoretical  properties  of  the  estimator  in  section 
2.  In  section  3,  we  will  compare  our  estimator  with  the  Grenander  estimate 
both  theoretically  and  numerically. 


2 


2 


Theoretical  Results 


The  Haar  basis  is  an  orthonormal  basis  of  ^[0, 1]  and  it  is  also  an  inter¬ 
polating  wavelet  basis  (Donoho,  1992),  therefore  the  wavelet  coefficients  for 
the  sampled  version  is  essentially  the  same  as  those  of  function  version. 

We  first  mention  some  results  on  the  function  version  coefficients.  Let 

=  -f[o, i)(0>  $(t)  —  -f[o,i/2)(*)  -  J[l/2,1)(*) 

and  define 

iM*)  =  V^(2H  -  k) 

For  /  6  ^2[0^  1]?  define  the  wavelet  coefficients 

bo  =  J  f<f>,  aj'k  =  J  fibj'k  (3) 

and  for  0  <  t  <  1, 

2J— 1 

f(t)  =  b0  +  Y,Ys 

j> 0  Jt=0 

(in  the  sense  of  L2).  Moreover,  there  is  the  extremely  useful  Parseval  rela¬ 
tion: 

11/  “  /lli2[o,i]  =  (^0  “  ^o)2  +  -  ajyk)2 

hk 


Consider  the  class  of  decreasing  functions  on  [0, 1]: 

V(C)  —  {/  :  /  decreasing  and  /( 0)  -  /( 1)  <  C } 
Lemma  1  Suppose  f  6  V{C)  and  aj^’s  are  defined  as  in  (3),  then 


ajyk  >  0 


for  all  j  >  0  and  0  <  k  <  V .  And  furthermore , 

'  2J  —  1 

2i/2  Hk  <  C/2 

k=Q 

for  all  j  >  0. 
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Proof:  For  j  >  0  and  k  —  0, -  1, 


Hk  =  Jq  f(x)1>j,k(x)dx  =  2  j/2  jf 

=  2'(i+2|/2({/(^)  ~  /(- (4) 

The  non-negativity  of  ctj,/.  is  obvious  from  (4)  and 


2i/s  E  -it  i  I E  («|)  -  /t^))  = /(0) = /(1)  <  on 


2J— 1 


/c=0 


fc=0 


This  completes  the  proof. 

Next  Lemma  gives  the  optimal  rate  under  L 2  norm: 

Lemma  2  (Minimax  L2  Risk) 

inf  sup  i?(/,  /)  x  n“2/3 
/  feV(C) 


Let 


A>(0  =  60  + 


m— 1  2J— 1 


EEa. 

j= 0  fc=0 


be  the  wavelet  estimate  of  /,  as  we  described  earlier,  the  following  Theorem 
says  that  fw  enjoys  the  optimal  (or  near  optimal)  rate  of  convergence. 

Theorem  1 


inf  sup  R{fw,f)~n  2/3 
{-\>}  jzV(C) 


For  A  j  =  CTi/2  log(n)/n, 

sup  R(fwJ)<C(^)2'3(l  +  o(l)) 
feV(C)  n 
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The  proof  is  in  the  Appendix. 


Remark:  In  practice,  cr2,  the  noise  level  of  the  data,  is  usually  unknown. 
It  is  natural  to  find  an  estimator  d,  say,  to  replace  the  unknown  a  in  the 
thresholds.  In  this  paper,  d  is  derived  from  a  linear  estimate  of  / 


m— 1  2J-1 

*2=  E  E& 

j=[m/2]  k=: 0 

Since  yj^  are  independently  N(ajiiQ1  a2 /n)  distributed, 

is  non-central  x2  distributed  with  the  non-centrality 


(5) 


na 


m-1  23-1 

^=.E  E  „ 

j=[m/2]  k—Q 

and  for  /  £  P(C),  from  Lemma  1  and  Jensen’s  inequality, 


m-l  2J— 1 


E  E  M)2  <  £(  E  irn) 


2  _ 


2C2y/n 


■  2,7 

By  calculating  the  first  and  second  momemts  of  the  non-central  x2  distri¬ 
bution,  for  X  ~  x2(<5)>  EX  =  n  +  62  and  Var(X)  =2 n  +  4<52,  we  have 

E(c2-<t2)2  =  0(n-1) 

Finally,  by  using  the  inequality 

(v5-V5)’<^ 

for  all  x  >  0  and  (fixed)  y  >  0,  we  have 

£(<7  —  a)2  =  0(n~1) 

and  this  leads  to: 

^Theorem  2  For  A j  =  dy^log^/n, 

sup  £(/„,/)  <C'(^)2/3(l  +  o(l)) 

/62>(C)  n 

The  proof  is  also  in  the  Appendix. 
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Comparisons  with  LS  Estimator 


Least  square  method  are  commonly  used  in  regression  analysis.  The  LS 
estimate  for  /  is 

/  =  arg  min  ]£(y,-  -  /,)2  (6) 

Jl£  — £/n  i_1 

Note  that  {/*}  are  also  the  maximum  likelihood  estimates  of  {/*}  when  {rt-} 
are  iid  Gaussian  random  variables. 

An  effective  algorithm,  Pool- Adjacent-Violators  Algorithm,  has  been  de¬ 
scribed  in  §1.2  of  Barlow  et  al  (1972)  and  the  resulting  estimator  is  usually 
called  the  least  concave  majorant  estimator  or  Grenander  estimator,  ref. 
Grenander  (1956),  Prakasa  Rao  (1983).  Properties  of  Grenander  estimator 
can  also  be  found  in  Berge  (1989)  and  Wang  (1991).  For  example: 

Lemma  3  Let  fc  be  the  Grenander  estimator,  then 

sup  i2(/G,/)  x  n“2/3 
feV(C) 

Le.  Grenander  estimator  achieves  the  optimal  rate. 

We  have  the  following  comparison: 

•  Wavelet  estimate  equipped  with  the  optimal  threshold  achieves  the 
optimal  rate  as  the  Grenander  estimate  does;  while  the  estimator  with 
the  simple  shrinkage  rule  of  using  A j  =  £^2 Tog (n)/n  is  within  a  factor 
(logn)2/3  of  the  optimal  rate. 

•  Computationally,  wavelet  procedure  is  more  efficient  than  Grenander 
procedure.  In  fact,  the  computational  effort  of  the  wavelet  procedure  is 
of  order  nlogn  comparing  with  order  n2  of  the  Grenander  procedure. 

*  •  Grenander  estimator  has  boundary  effects.  In  fact,  it  does  not  con¬ 

verge  at  discontinous  points,  Wang  (1991).  This  drawback  can  be 
seen  from  our  simulations,  while  the  wavelet  estimates  do  not  have 
boundary  effects. 
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As  a  final  remark,  we  should  mention  here  is  that  wavelet  procedure  does 
not  guarantee  us  a  decreasing  solution.  From  Lemma  1,  non- negativity  of 
the  Haar  coefficients  is  only  a  necessary  condition  for  /  to  be  decreasing. 

In  the  case  where  a  decreasing  solution  is  needed,  we  can  still  use  wavelet 
procedure  as  the  first  step,  since  wavelet  procedure  is  a  rate-preserve  trans¬ 
formation.  We  can  then  apply  Grenander  procedure  to  the  transformed 
data.  This  two-step  procedure  requires  less  computational  effort  than  ap¬ 
plying  Grenander  procedure  to  the  original  data. 


Numerical  examples  (Figure  1,  2,  3)  show  that  the  wavelet  estimate  is 
also  competitive  numerically  with  the  Grenander  estimate.  In  our  simula¬ 
tions,  we  compare  the  L\  and  L*i  losses  of  both  Grenander  estimator  and 
our  wavelet  estimator: 

R1  =  £  !/(*«•) -/(*.-)! 

t=l 

£2  =  Bfo)  - /(*))* 

*=i 


4  Appendix 

To  prove  our  main  theorem,  we  need  to  introduce  some  basic  facts  of  Besov 
space  theory.  Let’s  define  Besov  Body  in  sequence  space  lp 

oo  2J— 1 

©;,9(C)  =  =  (Oj,k)j>o,  0<k<2i  ■  £{2^(  £  IMT/p}  <  C] 

j= 0  fc=0 

for  q  <  oo  and 

©p,oo(C)  =  {0  =  (8j,k)j>a,  o<k<2i  :  sup{2^  £  \0j<k\p}  <  Cp} 

j>°  k=0 


Proof  of  Theorem  1:  From  Lemma  1,  it  is  obvious  that 

eD(C)c©+n©l£(c/2)  (7) 
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where  0+  =  {(Ohk) :  6j,k  >  0}. 

Therefore, 

sup  E\\fw  -  /IlLo.i]  =  sup  E\\a-a\\l 
feV(C)  aeeD(C) 

<  sup  .E||a- a||f2 

ae©+ne!fi(c/2) 

It  is  well  known  that  (Donoho  and  Johnstone,  1992a,  b) 

inf  sup  jE7| |a  -  a\\j2  x  n~2/3 
*  ae©!£(C72) 

and  for  the  shrinkage  estimates  a  equipped  with  the  optimal  thresholds,  we 
have  the  same  rate 

sup  E\\a  —  a||f2  x  rT2^ 

for  the  shrinkage  estimate  a  equipped  with  thresholds  A j  =  <7A/21og(n)/n, 
sup  E\\a  -  a\\f2  <  C'{ ^p)2/3 

a€@\'l(C/2) 

Let  a+  be  the  projection  of  a  onto  0+  fj0j^,(C/2),  then  we  have 

sup  E\\a+  -  a\\f2  x  n~ 2^3 
ae©+nej^(C/2) 

when  the  optimal  thresholds  are  used,  and 

sup  E\\a+-a\\l<C'(1-^)2'3 
a€©+fl  ©i^(C/2) 

when  thresholds  Aj  =  tri/21og(n)/n  are  used. 

It  is  obvious  that  the  coordinates  in  a+  have  the  expressions  in  (2). 

Proof  of  Theorem  2:  The  proof  follows  immediately  from  Theorem  1 
and  the  following  Lemma. 
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Lemma  4 


771-1  2J— 1 

J2  12  \E(hj,k  -  ai,k?  -  E(“j,k  -  Hk)2 1  =  o(n~2/3) 
j= 0  Jt=0 

Proof:  From  the  definitions  of  dy,*  and  ay,*, 

~  aj,*)2  ~  (oj,i  -  aj,02| 

A2  2Acr.  1|A  . 

where  zy,*  =  y/n(wjfk  -  ay,*)/<7.  So 


~  aj,k)2  ~  E(ahk  -  ay,*)2 1 

<  -  ,)»  +  ^E{ Mi»  - 


<  2-E{cr-c j)2  +  £(<7-<7)2P{z  >  A(|a1  - 


2A<j 


\-p 


<  0(^)  +  0(- 


n3/2  P{,>^A1-^)}^) 


where  z  ~  iV(0,  1),  0  <  ft  <  1/2. 
For  those  satisfying 


Vnaj,k 

aX 


<  T 


where  0  <  r  <  1.  Let  Zn  be  a  standardized  non-central  x2_^(<5n)  variate 
(i.e.  £Zn  =  0  and  Var(Zn)  =  1)  and  s  €  (r,  1), 


<  P{z  >  A(  —  A  1  —  r)} 

(7 

<  P{z  >  A(^  Al-r),a>  scr}  +  P{&  <  scr} 

<  P{z  >  A (s  -  r)}  +  P{x2n-.^{&n)  <  52w} 


A(s  -  r) 


V2rT+46l 
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"  °{n(’-r )2V^i¥)  +  0(n  ^ 

0<"  n(a~T)2  y/\ogv) 

Consider 

Nn  =  >r,0<j<m,0<k<2j} 

<J  A 

where  m  =  log2  n.  Then  for  /  €  £>(C),  from  Lemma  1, 


V2C 

y/2-1 


m  — 1  tm  — 1  2J'  —  1 

E  -*2 

j— 0  fc— 0  ajtk>raX/-s/n 


rcr\Nn 

y/ri 


and  so 


iVn< 


y/2nC 

(\/2  “  l)rcrA 


Therefore,  by  choosing  r,  /?  small  and  5  close  to  1  such  that 


(8) 


(s-r)2(  1/2- (3)  >1/6 


we  have 


771  —  1  2>-l 

y:  yi  —  aj^)2  ~  ~  ^j,^)  i 

i=o  fc=o 


i=o  fc=o 


j=0  fc=0 


=  nffe  +  0fV?^n/  V  1 _ 

n  n3/2  2-^  n(,_r)2(1/2“^)v/logn  1 

ajtjt<r*trA/>/n  a-j^yra  Af  s/n 


+  E  1}) 


iogn  y/Togn/ 


n  ^  «3/2  *'n(*-r 


+ 


ft3/2  V  n(3-r)2(l/2-^)v/l0g  n  Y  log  71 


n 


■)) 


=  o(n-2/3) 

This  completes  the  proof  of  the  Lemma. 
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